Blog Classification: Adding Linguistic Knowledge to Improve the K-NN Algorithm

نویسندگان

  • Ines Bayoudh
  • Nicolas Béchet
  • Mathieu Roche
چکیده

Blogs are interactive and regularly updated websites which can be seen as diaries. These websites are composed by articles based on distinct topics. Thus, it is necessary to develop Information Retrieval approaches for this new web knowledge. The first important step of this process is the categorization of the articles. The paper above compares several methods using linguistic knowledge with k-NN algorithm for automatic categorization of weblogs articles.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Coupling K-nearest Neighbors with Logistic Regression in Case-based Reasoning

Case-based reasoning (CBR) systems use similarity functions to solve new problems with past situations. K-nearest neighbors algorithm (K-NN) have been used in CBR systems to define new cases status according to characteristics of past nearest cases. We proposed a new hybrid approach combining logistic regression (LR) with K-NN to optimize CBR classification. First, we analyzed the knowledge dat...

متن کامل

A comparative study of performance of K-nearest neighbors and support vector machines for classification of groundwater

The aim of this work is to examine the feasibilities of the support vector machines (SVMs) and K-nearest neighbor (K-NN) classifier methods for the classification of an aquifer in the Khuzestan Province, Iran. For this purpose, 17 groundwater quality variables including EC, TDS, turbidity, pH, total hardness, Ca, Mg, total alkalinity, sulfate, nitrate, nitrite, fluoride, phosphate, Fe, Mn, Cu, ...

متن کامل

Optimized Seizure Detection Algorithm: A Fast Approach for Onset of Epileptic in EEG Signals Using GT Discriminant Analysis and K-NN Classifier

Background: Epilepsy is a severe disorder of the central nervous system that predisposes the person to recurrent seizures. Fifty million people worldwide suffer from epilepsy; after Alzheimer’s and stroke, it is the third widespread nervous disorder.Objective: In this paper, an algorithm to detect the onset of epileptic seizures based on the analysis of brain electrical signals (EEG) has b...

متن کامل

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

Improvement of the Effective Components in the PDR Positioning Method Based on Detecting the User’s Movement Mode Using Smartphone Sensors

The purpose of this paper is to evaluate and improve the accuracy of indoor positioning using smartphone sensors based on Pedestrian Dead Reckoning (PDR) method. In some specific situations, such as fires or power outages that disable infrastructure-based positioning techniques, using PDR method based on smartphone sensors that perform positioning continuously is a good solution.This paper focu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008